Skip to content

feat(config)!: migrate zebrad to use a layered configuration#9768

Merged
gustavovalverde merged 98 commits intomainfrom
feat-config-rs
Aug 27, 2025
Merged

feat(config)!: migrate zebrad to use a layered configuration#9768
gustavovalverde merged 98 commits intomainfrom
feat-config-rs

Conversation

@gustavovalverde
Copy link
Copy Markdown
Member

Motivation

  • Replace Abscissa-based config loading and the Docker TOML generation flow with a standard layered configuration approach.
  • Let users configure via environment variables (ZEBRA_SECTION__KEY) without an entrypoint-generated zebrad.toml.
  • Remove legacy ZEBRA_* test/internal env vars to avoid collisions with the new config-rs prefix.

Solution

  • Config loading

    • Implement ZebradConfig::load() using config-rs (defaults → optional TOML → env).
    • Derive/ensure defaults and deny_unknown_fields in config structs.
  • Docker entrypoint

    • Remove config generation
    • Keep test execution path for cargo nextest.
  • Environment variables (breaking)

    • Stop honoring legacy ZEBRA_* test/path vars. Use:
      • SKIP_NETWORK_TESTS, SKIP_IPV6_TESTS
      • TEST_LIGHTWALLETD, TEST_LARGE_CHECKPOINTS, TEST_SYNC_TO_CHECKPOINT, TEST_SYNC_PAST_CHECKPOINT
      • STATE_CACHE_DIR instead of ZEBRA_CACHE_DIR
      • FORCE_USE_COLOR instead of ZEBRA_FORCE_USE_COLOR
    • Logging: stop using RUST_LOG/ZEBRA_RUST_LOG; use ZEBRA_TRACING__FILTER.
  • Tests and helpers

    • Provide a default network-matched mining.miner_address in the test config helper, selected by ZEBRA_NETWORK__NETWORK or the test’s requested network; still overridable by ZEBRA_MINING__MINER_ADDRESS or TOML.
    • Add zebrad/tests/common/configs/v2.4.2.toml and normalize cache_dir, [rpc].cookie_dir = "cache_dir".
  • Documentation

    • Update generate.rs docs to show ZEBRA_SECTION__KEY usage.
  • Breaking changes

    • Legacy ZEBRA_* test/path environment variables are removed.
    • Docker entrypoint no longer generates/rewrites TOML; configuration is defaults + optional TOML + ZEBRA_* env.

Tests

Our whole test suite must pass

Follow-up Work

  • Remove any remaining Abscissa Configurable trait usage.
  • Update Zebra Book docs for the new configuration model.
  • Update CI to use new env variable names.
  • Validate z3 docker-compose with the new env-only configuration.

PR Checklist

  • The PR name is suitable for the release notes.
  • The solution is tested.
  • The documentation is up to date.

Previously, cache mounts targeted specific subdirectories which did not match
the CARGO_HOME path, leading to ineffective caching and repeated downloads.
This change uses a single cache mount for CARGO_HOME in both test and release
stages, ensuring Cargo can leverage BuildKit's cache for faster rebuilds.

Additionally, introduced CARGO_TARGET_DIR for consistent build output paths,
replaced bind mounts with COPY --link for test stage to ensure source code
availability, and updated cache handling to preserve artifacts across builds.

Fixes #9331
Relates #6169
- Override FEATURES in Dockerfile tests stage with complete feature set needed for all tests
- Convert entrypoint.sh from cargo test to cargo nextest run with --workspace scope
- Update nextest.toml test filters to match actual function names:
  - generate_checkpoints -> generate_checkpoints_mainnet/testnet
  - fake_activation_heights -> with_fake_activation_heights
- Add nextest overrides for lightwalletd and sync test categories
- Remove feature duplication by using centralized FEATURES env var
- Align build and test scopes between Dockerfile and entrypoint to prevent recompilation

This ensures consistent feature sets across build and test phases while
standardizing on nextest for better test execution and filtering.
This commit refactors the testing architecture to remove test-specific feature flags in favor of runtime environment variables and nextest profiles. This simplifies the build process, as a single test binary can be built and used for all test scenarios.

The primary changes are:
- Test-specific features like `lightwalletd-grpc-tests`, `sync_*`, and `zebra-checkpoints` have been removed from `Cargo.toml`.
- The tests formerly guarded by these features are now controlled by `ZEBRA_TEST_*` environment variables. These act as opt-in safety guards, preventing developers from accidentally running long or resource-intensive tests locally.
- CI workflows in `.github/workflows` have been updated. They now use `nextest` profiles to select specific tests and set the corresponding `ZEBRA_TEST_*` variables to activate them.
- The `tonic-build` dependency, required for `lightwalletd` tests, is now correctly configured as a test-only build dependency. It is enabled through the `zebra-test` feature, ensuring it is not included in production release builds. This resolves a build warning related to an unknown `cfg` value.
…ntime controls

- Remove lightwalletd-grpc-tests feature from zebra-test crate
- Move lightwalletd-grpc-tests feature definition in zebrad/Cargo.toml to proper location
- Make tonic-build conditional on lightwalletd-grpc-tests feature in build.rs
- Remove feature gates from test modules, allowing them to compile unconditionally
- Remove feature gate from lightwalletd_test_suite test execution
- Update CI workflow to include lightwalletd-grpc-tests feature for comprehensive testing

Tests now use runtime environment variables (ZEBRA_TEST_*) instead of compile-time
features, allowing a single test binary to run different test suites based on
configuration while keeping production builds lean.
- Add feature gates to lightwalletd test infrastructure to prevent compilation errors when lightwalletd-grpc-tests is disabled
- Add feature gate to indexer test to prevent compilation errors when indexer is disabled
- Move lightwalletd-related imports and constants behind feature gates
- Wrap gRPC code generation in feature-conditional module
- Fix GitHub Actions workflow to pass features as single string
- Restore missing lightwalletd_failure_messages method with feature gate
- Add missing DATABASE_FORMAT_UPGRADE_IS_LONG import

This ensures --no-default-features builds work correctly while maintaining full functionality when features are enabled.
- Remove unused DATABASE_FORMAT_UPGRADE_IS_LONG import
- Update references to use common::cached_state::DATABASE_FORMAT_UPGRADE_IS_LONG
- Fix unused import linting warning
- Add `config = { version = "0.15.14", features = ["toml", "json"] }` to `[workspace.dependencies]`
- Switch `zebrad` to `config = { workspace = true }`
Add `ZebradConfig::load_with_env(config_path, env_prefix)` to allow loading a
config from environment variables using a caller-specified prefix. `load(...)`
delegates to this with the default `ZEBRA` prefix so existing behavior is
unchanged.

Update the `copy-state` command to load its target config with the
`ZEBRA_TARGET_...` prefix (e.g. `ZEBRA_TARGET_STATE__CACHE_DIR`) so source and
target environment overrides do not conflict.
- Adds `ZebradConfig::load_with_env` to load configs using a specified
  environment variable prefix.
- Updates the `copy-state` command to use the `ZEBRA_TARGET_` prefix for
  its target configuration, preventing conflicts with the source config's
  `ZEBRA_` environment variables.
- Improves test reliability by handling mutex poisoning in `EnvGuard`,
  which prevents a single test failure from cascading.
- Fixes a bug where environment variable overrides were not being correctly
  applied in tests due to incorrect prefix handling.
- Updates documentation and test profiles to reflect these changes.
@arya2
Copy link
Copy Markdown
Contributor

arya2 commented Aug 25, 2025

This regtest_block_templates_are_valid_block_submissions() test failure is due to it using Mainnet instead of Regtest in the spawned test child. The environment variable is overriding the config values written to the test child's temp directory.

This downgrade_state_format() test failure looks like it's passing for Mainnet but failing on Testnet, likely the same issue as it's also writing the config to the test child's temp directory, and as the test is passing for Mainnet.

We probably don't want to define ZEBRA_NETWORK__NETWORK in the environment of the test-all job.

I think NETWORK was previously used in entrypoint.sh as part of the base config, so writing a config would override the value, but now it's in the environment and it overrides any value that's written to a config.

Copy link
Copy Markdown
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, thank you for making all these changes.

arya2 and others added 10 commits August 26, 2025 04:14
* Removes `ZEBRA_NETWORK__NETWORK` env var from `Run all tests`

* removes duplicate 'and'
Co-authored-by: Arya <aryasolhi@gmail.com>
Split `sync-to-mandatory-checkpoint` nextest profile into
`sync-to-mandatory-checkpoint-mainnet` and
`sync-to-mandatory-checkpoint-testnet` so each profile runs a single
network-specific test.

Update CI job `sub-ci-integration-tests-gcp.yml`
to select `NEXTEST_PROFILE` dynamically per network so only the matching
profile runs in a workflow invocation. This prevents concurrent tests for
different networks from sharing the same cache dir and creating race conditions.
- Add `ZEBRA_ENV_PREFIX` const and use it in `EnvGuard::new` and the `Drop` impl.
- Replace `assert!(result.is_err())` with `.expect_err(...)` in tests that don't inspect the error value.
Co-authored-by: Arya <aryasolhi@gmail.com>
Copy link
Copy Markdown
Contributor

@arya2 arya2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great, thank you!

Opened #9842 to track non-blocking comments that we should revisit.

@arya2
Copy link
Copy Markdown
Contributor

arya2 commented Aug 27, 2025

@Mergifyio refresh

@mergify
Copy link
Copy Markdown
Contributor

mergify bot commented Aug 27, 2025

refresh

✅ Pull request refreshed

@gustavovalverde
Copy link
Copy Markdown
Member Author

I'll admin merge to be able to clean the commit message when the PR merges

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-devops Area: Pipelines, CI/CD and Dockerfiles A-rust Area: Updates to Rust code C-breaking Category: A breaking change for users C-enhancement Category: This is an improvement C-feature Category: New features I-usability Zebra is hard to understand or use P-High 🔥

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants